Name | Version | Summary | date |
reliq |
0.0.28 |
Python ctypes bindings for reliq |
2024-12-21 20:22:27 |
indoxMiner |
0.1.0 |
Indox Data Extraction |
2024-12-19 14:09:47 |
yurenizer |
0.2.2 |
A library for standardizing terms with spelling variations using a synonym dictionary. |
2024-12-08 08:03:52 |
chonkie |
0.2.2 |
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library |
2024-12-06 22:57:29 |
huggingface-text-data-analyzer |
1.1.0 |
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library |
2024-12-06 03:06:41 |
ts-tokenizer |
0.1.17 |
TS Tokenizer is a hybrid (lexicon-based and rule-based) tokenizer designed specifically for tokenizing Turkish texts. |
2024-12-03 09:31:39 |
pawpaw |
1.0.0rc8 |
High Performance Text Processing & Segmentation Framework |
2024-11-18 04:28:31 |
analiticcl |
0.4.8 |
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation |
2024-10-17 19:47:08 |
sesdiff |
0.3.2 |
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein). This is the Python binding. |
2024-10-15 08:59:03 |
abbreviation-extractor |
0.1.4 |
A library for extracting abbreviations from text. |
2024-09-14 20:02:14 |
html-to-markdown |
1.1.0 |
Convert HTML to markdown |
2024-09-09 06:26:33 |
tokenize-text |
0.2.32 |
Tokenizing and processing text inputs with transformer models |
2024-08-11 21:53:24 |
tokenize-transformer |
0.2.14 |
Tokenizing and processing text inputs with transformer models |
2024-08-01 18:39:29 |
flashtext2 |
1.1.0 |
A package for extracting keywords from large text very quickly (much faster than regex and the original flashtext package |
2024-07-04 14:40:37 |
roter |
2024.6.25 |
Rotate and combine tables (Danish: Roter og kombiner borde). |
2024-06-24 19:22:08 |